NTCIR-7 Patent Translation Experiments at Hitachi
نویسندگان
چکیده
Statistical Machine Translation (SMT) is a new paradigm in machine translation, which enables highquality translation. However, many translation errors occur in the translation of complex and compound sentences because of the lack of grammatical knowledge about the global structure of a sentence. We adopt the pre-editing method, which divides sentences into clauses, and translate these clauses using the Moses SMT engine. The translation accuracy, BLEU, was 29.33%, so preediting has a small effect. Translation quality is degraded because the order of words is changed by not using information about other clauses. We also performed an experiment to confirm the optimum distortion-limit parameter of Moses. The Maximum BLEU was 29.45 for an English-Japanese patent translation when the distortion limit was 20 instead of -1.
منابع مشابه
NTCIR-7 Patent Mining Experiments at Hitachi
This paper reports results of our experiments on the automatic assignment of patent classification to research paper abstracts. We applied K-Nearest Neighbors Methods and three kinds of query term expansion methods using a research paper abstract dataset and a patent document dataset to improve the classification accuracy. The results show that these query expansion methods slightly improve cla...
متن کاملNTCIR-7 Experiments in Patent Translation based on Open Source Statistical Machine Translation Tools
This paper describes our experiment methods and results in the NTCIR-7 Patent Translation Task [1]. As the first step of our research in machine translation, we integrated a series of open source software to build a statistical translation model. The experiment results demonstrated that we still need to improve the performance and efficiency in both model training and testing.
متن کاملNTCIR-5 Patent Retrieval Experiments at Hitachi
In NTCIR-5, we used five retrieval methods proposed in NTCIR-4: (1) query term weighting using only document frequency, (2) stopword deletion, (3) two-stage patent retrieval, (4) term weighting considering “measurement terms”, and (5) related term expansion. In this paper, we compare the retrieval accuracy for two test sets: 34 main queries in NTCIR-4 and 1189 new queries in NTCIR-5. Then, we e...
متن کاملPatent SMT Based on Combined Phrases for NTCIR-7
In this paper, we describe a combined phrase approach to the Statistical Machine Translation of Japanese patents into English. To resolve the segmentation errors caused by the rich OOV (out-of-vocabulary) words in the patent texts, the character based translation phrases are first employed. Then the word based translation phrases are established to utilize the dependable word level information....
متن کامل